Rational metacognition in memory (all trials)

Summary

Memory recall is often modeled as a process of evidence accumulation. Existing models typically assume that this accumulation process is passive, and not subject to top-down control. In contrast, recent work in perceptual and value-based decision making has suggested that similar kinds of evidence accumulation processes are guided by attention, such that evidence for attended items is accumulated faster than for non-attended items. Furthermore, attention may be adaptively allocated to different items in order to optimize a tradeoff between decision quality and the computational cost of evidence accumulation. In this project, we ask whether similar forces are at play in the context of memory recall.

Such a model predicts that, when multiple memories are relevant, people will focus their efforts on recalling the target which is more strongly represented in memory, because it can be recalled with less effort. Here we present a simple form of such a model, and test this key prediction in a cued-recall experiment in which participants can select which of two possible targets to remember. We find tentative support for a model in which memory search is guided by partial recall progress in order to minimize the time spent recalling.

Model

We model memory recall as a process of evidence accumulation. As in the DDM or LCA, we assume that evidence is sampled at each time step and that recall occurs when the total evidence hits a threshold. To make the model solvable (by dynamic programming) we assume that the evidence for each target follows a Bernoulli distribution \[ x_t \sim \text{Bernoulli}(p), \] where \(p\) corresponds to the strength of the memory. This image shows several possible traces of evidence accumulation for a single item:

knitr::include_graphics("figs/accumulation.png")

When multiple memories are relevant, each one has a separate accumulator. Critically, we do not assume that evidence is sampled for each item in parallel. Instead, at each time step, the agent must select one of the targets and accumulates evidence for only that target. This induces a metalevel control problem: which target should the agent focus on at each moment, given only the current state of the accumulators?

This problem can be formalized as a Markov decision process in which the states correspond to the total evidence accumulated and time spent for each item (thus, the state is 4 dimensional). Because we use a discrete accumulation process, we can solve it exactly by dynamic programming. We find that the optimal policy generally converges on the target with maximal memory strength (highest \(p\)) and only draws samples for that target until it is recalled. This is illustrated in the following plot:

knitr::include_graphics("../model/figs/simple_fixation.png")

Alternative models NEW

In order to discern which model predictions are specific to a rational meta-memory model, we show predictions of two alternative models. The “Random” model randomly samples fixation durations from the empirical distribution. We also A more sophisticated “Random Commitment” model was developed to account for the empirical fact that last fixations are considerably longer than all other fixations. This model samples the total number of fixations on each trial from the empirical distribution and then samples the duration of each from the empirical distribution of non-final fixations. If a target is not recalled before reaching the final fixation, the model continues to fixate on the current cue until the target is recalled or the 15 second time limit is reached.

Experiment

To test the model’s predictions, we developed a modified cued-recall experiment in which participants were presented with two cues (images) on each trial and could recall the target (word) associated with either one. To create an observable behavioral correlate of targeted memory search, only one cue is visible at a time and participants use the keyboard to display each in turn. This is basically a cheap alternative to eye-tracking. The assumption is that people will look at the image they are currently trying to remember the word for. See a demo here.

knitr::include_graphics("figs/task.png")

Measuring memory strength NEW

The model predicts that people will spend more time looking at the cue for which the memory of the corresponding target is stronger, that is the cue for which \(p\) is higher. Unfortunately, we cannot measure \(p\). However, we can collect a noisy signal of this parameter using an auxiliary task. Concretely, we use reaction time in a 2-AFC task in which participants are presented with a word and must select the matching image.

To map this measure onto the model’s \(p\) parameter, we take advantage of the fact that the expected time to reach threshold in the model is \(E[t] = \theta / p\) which in implies that \(\log E[t] = \log(\theta) - \log(p)\). This suggests that, in broad strokes, the \(p\) parameter should be log-linearly related to response times. The exact nature of this relationship is unclear, however, given that the 2-AFC task is quite different from a cued recall task. Thus, we simply normalize (Z-score) the \(\log(p)\) parameter and the 2AFC-RT measure (the latter within subject) to put them on roughtly the same scale. Finally, to account for the fact that the 2AFC measure is very noisy, we corrupt \(\log(p)\) with \(\sigma=3\) Gaussian noise.

In the simulations, we sample the parameter \(p\) from a \(\text{Beta}(2, 6)\) distribution, which was chosen because it makes some of the plots look better.

Main Results

Fixation time course

The key model prediction is that the probability of looking at the cue with greater memory strength increases over time.

normalized_timestep = function(long) {
    long %>% 
        group_by(trial) %>%
        mutate(prop_duration = duration / sum(duration)) %>% 
        ungroup() %>% 
        mutate(n_step=round(prop_duration * 100)) %>% 
        uncount(n_step) %>% 
        group_by(trial) %>% 
        mutate(normalized_timestep = row_number())
}

long %>% 
    normalized_timestep %>% 
    drop_na(strength_diff) %>% 
    ggplot(aes(normalized_timestep/100, fix_stronger, group = strength_diff, color=strength_diff)) +
    geom_smooth(se=F) + 
    ylim(0, 1) +
    facet_grid(~name) +
    labs(x="Normalized Time", y="Probability Fixate\nStronger Cue", color="Strength Difference") +
    geom_hline(yintercept=0.5) +
    theme(legend.position="top")

In both the optimal model simulations and the human data, the probability of fixating the stronger cue steadily increases over the course of the trial, and this increase is more pronounced when there is a larger difference in strength.

In the random model, we only see a slight spike at the end of the trial. This spike is due to the fact that the model always remembers the item it looks at last, and it is also more likely to remember the item with a stronger memory.

The random commitment model does not show this effect because the item it fixates last (and thus remembers) is usually determined by the random commitment.

Individual fixation durations

Although the random models we designed did not demonstrate the smoothly increasing probability shown by both humans and the optimal model, it is possible that the aforementioned last-fixation phenomenon could be partially driving that result. We can avoid this confound by looking at the duration of individual fixations, conditioning on the fact that each is not the final fixation. The model predicts that fixations should be longer for stronger cues, and (for non-initial fixations) when the competing cue is weaker.

For each of the first three fixations, we will plot model predictions alongside human data. Because we are excluding final fixations, different participants are better represented in some part of the x axis than others. This produces Simpon’s-paradox-like effects. Thus, we plot the fixed effect prediction of a mixed effects model rather than a generic fixed effects model. The printed regression tables describe the fixed effects of the plotted random effects model applied to the human data.

First fixation

df %>% 
    filter(n_pres >= 2) %>% 
    regress(strength_first, first_pres_time) # plot
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1295.693 101.062 12.821 45.052 0.000
strength_first 15.744 25.669 0.613 174.018 0.540
p values calculated using Satterthwaite d.f.

The optimal model predicts a weak positive effect, while humans almost show a negative effect. Previously, the model predicted a stronger effect here. The difference is primarily due to changing the distribution from which \(p\) is drawn. When \(p\) is low, progress signals are sparse, and thus the model cannot detect the memory strength as rapidly (and is thus less sensitive to memory strength on the first fixation). I’m not sure if this is a reasonable thing to do. Essentially, we are qualitatively “fitting” the evidence accumulation rate to match the trends in the data.

Second fixation

df %>% 
    filter(n_pres >= 3) %>% 
    regress(rel_strength, second_pres_time) # plot
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1090.476 131.081 8.319 35.848 0.000
rel_strength -89.675 49.620 -1.807 45.024 0.077
p values calculated using Satterthwaite d.f.

Previously, we had a significant effect in the human data. But excluding the incorrect trials pushed the p up a bit. It does seem plausible that the effect is there though.

Third fixation

df %>% 
    filter(n_pres >= 4) %>% 
    regress(rel_strength, third_pres_time)
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1366.718 354.319 3.857 16.112 0.001
rel_strength 464.200 170.497 2.723 10.261 0.021
p values calculated using Satterthwaite d.f.

I previously thought this could be due to selection effects, but given that we aren’t seeing anything in either random model, I am no longer so worried about this. However, the fact the the human result is much stronger than the simulation suggests that this might be a fluke.

Overall fixation proportion

The simplest test of rational memory: do people spend more time looking at the cue that they have a stronger memory of? This analysis only considers trials where both cues are seen.

df %>% 
    filter(n_pres >= 2) %>% 
    regress(rel_strength, prop_first)
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 0.503 0.014 35.125 52.886 0.000
rel_strength 0.034 0.008 4.115 38.652 0.000
p values calculated using Satterthwaite d.f.

This is suggestive, but there’s a caveat.

Last-fixation effect

People remember the thing they look at last 96% of the time. In value-based decision making, it has been suggested that this “last-fixation effect” could explain away what looks to be evidence for adaptive attention allocation. Briefly, the last fixation is correlated with both fixation proportion and relative strength, and this could be entirely driving the correlation between strength and fixation proportion. See here for a more thorough explanation.

df %>% 
    filter(n_pres >= 2) %>% 
    regress_interaction(rel_strength, last_pres, prop_first)
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 0.670 0.021 31.551 39.055 0.000
rel_strength 0.028 0.010 2.896 36.587 0.006
last_pressecond -0.276 0.027 -10.038 43.502 0.000
rel_strength:last_pressecond -0.028 0.012 -2.342 57.030 0.023
p values calculated using Satterthwaite d.f.

The effect of relative memory strength on fixation proportion mostly disappears when we control for the effect of the last fixation. In the optimal model (with these parameters), it disappears completely!

However, we still need to explain why we see the overall proportion effect in both the human data and optimal simulations, but not either of the random models. It seems likely that something rational is at play here.

It turns out, the overall proportion effect is primarily driven (we think) by the interaction between two other effects:

Strength predicts last fixation

df %>% 
    mutate(last_pres_first = as.numeric(last_pres == "first")) %>% 
    regress(rel_strength, last_pres_first)
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 0.615 0.032 19.024 63.277 0.000
rel_strength 0.074 0.014 5.231 433.556 0.000
p values calculated using Satterthwaite d.f.

This effect occurs mechanistically because (1) the currently fixated cue is almost always the one remembered, (2) remembering the cue terminates the trial, making this fixation the final one, and (3) the stronger cue is more likely to be remembered.

However, this effect doesn’t occur in the random commitment model because the cue it fixates last is usually determined by the random commitment, not by crossing the threshold (only the latter being more likely for stronger cues).

Last fixations are longer

long %>% 
    # filter(name == "Human") %>% 
    ggplot(aes(last_fix==1, duration)) + 
    stat_summary(fun.data=mean_cl_boot, geom="bar", fill="white", color="black") +
    stat_summary(fun.data=mean_cl_boot, geom="errorbar", width=0.2) +
    facet_grid(~name) + 
    scale_x_discrete(name="Fixation Type", labels=c("Non-final", "Final")) +
    ylab("Duration")

last_diff = long %>% 
    filter(name == "Human") %>% 
    with(tapply(duration, last_fix, mean)) %>% 
    diff

This is itself a cool finding because we see the opposite (shorter final fixations) in value-based and perceptual decisions, typically explained as the result of crossing a threshold cutting off the final fixations. The reason we see this in the optimal model is that the trial doesn’t stop when the model decides which cue has a stronger memory (as in value-based and perceptual tasks). Instead, the model needs to continue fixating the that cue until it remembers it. By this logic, the long last fixation is evidence that at some point people commit to remembering one of the cues. This is what initially motivated the random commitment model.

Putting the pieces together: rational commitment

So, we’ve seen that the last fixation tends to be on the stronger cue, and this fixation is longer than all the others. It immediately follows that overall, more time will be spent looking at the stronger cue. However, these two effects only hold in the optimal simulations and human data. Each random model can capture one effect, but not the other. Why is this? Well, without a commitment decision, there’s no natural way to get longer final fixations. (Fixations don’t get longer over the course of a trial overall). But committing randomly breaks the relationship between strength and the final fixation. Thus, the only way (I think) you can see both of the above effects is for there to be non-random—in particular, rational—commitment.

Or it could just be the pure random model plus some motor time to select the cue. However, given that last fixations are 908ms longer, this probably isn’t driving the effect.

…right?

Sanity checks for evidence accumulation

All these plots are testing really basic predictions of the evidence accumulation model, that is, effects that come out in the random model. They are all significant in the human data, but the RT ones are a lot weaker. This suggests that we might need to make some adjustments to the base accumulation model, especially if we wanted to do any real model fitting.

All these plots exclude timeout trials.

Probability of remembering first word

df %>%
    filter(response_type == "correct") %>% 
    mutate(choose_first = as.numeric(choose_first)) %>% 
    regress(rel_strength, choose_first) +
    ylab("Prob Select First Cue")
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 0.634 0.030 21.307 62.124 0.000
rel_strength 0.092 0.015 6.243 185.128 0.000
p values calculated using Satterthwaite d.f.

Reaction time by chosen strength

df %>% 
    filter(response_type == "correct") %>% 
    regress(chosen_strength, rt)
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 3558.159 160.862 22.119 56.505 0.000
chosen_strength -155.152 66.243 -2.342 39.697 0.024
p values calculated using Satterthwaite d.f.

Last fixation duration by strength

df %>% 
    filter(response_type == "correct") %>% 
    filter(n_pres > 0) %>% 
    mutate(
        last_pres_time = map_dbl(presentation_times, last),
        last_rel_strength = if_else(last_pres == "first", rel_strength, 1 - rel_strength),
        last_strength = if_else(last_pres == "first", strength_first, strength_second)
    ) %>% 
    regress(last_strength, last_pres_time)
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1803.509 75.409 23.916 61.002 0.000
last_strength -36.794 36.689 -1.003 30.975 0.324
p values calculated using Satterthwaite d.f.

Miscellaneous

Timeout rate

df %>% 
    group_by(name) %>% 
    summarise(timeout_rate = mean(response_type == "timeout")) %>% 
    kable(digits=2)
name timeout_rate
Optimal 0.04
Human 0.02
Random 0.11
Random Commitment 0.12

Reaction time

df %>% 
    ggplot(aes(rt)) +
    geom_density() +
    facet_grid(~name) +
    labs(x="Reaction Time", y="Density")

Number of fixations

df %>% 
    filter(n_pres < 10) %>% 
    ggplot(aes(n_pres, ..prop..)) +
    geom_bar() +
    facet_grid(~name) +
    labs(x="Number of Fixations", y="Proportion of Trials")